Principal Methods

3

1.1.2

Gradient Approximation

As described in Section 1.1.1, while updating the parameters in BNNs and 1-bit networks,

the full-precision weights are updated with the gradient

∂C

∂ωb . But forward propagation has

a sign function between full-precision weights and binarized weights. In other words, the

gradient of the sign function should be considered when updating full-precision weights.

Note that the derivative of the sign function keeps zero and only becomes infinity at zero

points, and a derivable function is widely utilized to approximate the sign function.

The first one to solve this problem in a 1-bit network is BinaryNet [99]. Assuming that

an estimator of gq of the gradient ∂C

∂q , where q is Sign(r), has been obtained. Then, the

straight-through estimator of ∂C

∂r is simply

gr = gq1|r|≤1,

(1.3)

where 1|r|≤1 equals 1 when |r| ≤1. And it equals 0 in other cases. It can also be seen

as propagating the gradient through the hard tanh, which is a piecewise-linear activation

function.

The Bi-real Net [159] approximates the derivative of the sign function for activations.

Unlike using Htanh [99] to approximate the sign function, the Bi-real Net uses a piecewise

polynomial function for a better approximation.

Bi-real Net also proposes a magnitude-aware gradient for weights. When training BNNs,

the gradient ∂C

∂W is only related to the sign of weights and is independent of its magnitude.

So, the Bi-real Net replaces the sign function with a magnitude-aware function.

Xu et al. [266] use a higher-order approximation for weight binarization. They propose

a long-tailed approximation for activation binarization as a trade-offbetween tight approx-

imation and smooth backpropagation.

Differentiable Soft Quantization (DSQ) [74] also introduces a function to approximate

the standard binary and uniform quantization process called differentiable soft quantization.

DSQ employs hyperbolic tangent functions to gradually approach the staircase function for

low-bit quantization (sign function in 1-bit CNN). The binary DSQ function is as follows:

Qs(x) =

1,

x <1

1,

x > 1

stanh(kx),

otherwise

,

(1.4)

with

k = 1

2log( 2

α 1), s =

1

1α.

(1.5)

Especially when α is small, DSQ can closely approximate the uniform quantization

performance. This means that a suitable α will allow DSQ to help train a quantized model

with higher accuracy. Note that DSQ is differentiable, and thus the derivative of this function

can be used while updating the parameters directly.

According to the above methods, we can summarize that they all introduce a different

function to approximate the sign function in BinaryConnect so that the gradient to full-

precision weights or activations can be obtained more accurately. Therefore, the BNN or 1-

bit network converges easier in the training process, and the network performance improves.

1.1.3

Quantization

BinaryConnect and BinaryNet use simple quantization methods. After the full-precision

weights are updated, the new binary weights are generated by taking the sign of real-value

weights. But when the binary weights are decided only by the sign of full-precision weights,